feat: deterministic replay for recorded RLM runs#21
Merged
Conversation
Enable replaying previously recorded runs without making live LLM calls. Recorded LLM responses are stored as trace events and consumed in order during replay, re-executing all eval'd code deterministically. New modules: - RLM.Replay — orchestrator with patch support for code substitution - RLM.Replay.Tape — builds ordered response sequences from EventLog - RLM.Replay.LLM — LLM behaviour impl using process-dict tape state Recording infrastructure: - enable_replay_recording config flag (default: false) - [:rlm, :llm, :response, :recorded] telemetry event - original_context/query stored in node_start events (depth-0) - replay_patches field on Worker struct for code patching Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a replay patch causes extra iterations beyond what the tape recorded, the :live fallback switches to a real LLM module instead of returning an error. The fallback module is configurable via the :config option's llm_module key. New module: RLM.Replay.FallbackLLM — tries tape first, delegates to a live LLM module when entries are exhausted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the ability to record LLM responses during a run and replay them later without making live API calls. This enables:
:livefallbackHow it works
Recording
When
enable_replay_recording: trueis set, the Worker emits a[:rlm, :llm, :response, :recorded]telemetry event after each successful LLM call. The EventLogHandler persists the full response text and usage metadata as:llm_responseevents in both the in-memory Agent and :dets TraceStore.The original
contextandqueryare also stored in the:node_startevent for depth-0 workers, so replay can recover the inputs.Replay
RLM.replay(run_id)builds a Tape (ordered list of recorded responses) from the EventLog, then starts a new Worker that usesRLM.Replay.LLM— a process-dict-based LLM behaviour implementation that returns responses from the tape instead of calling the API. All eval'd code is re-executed normally.Patching
RLM.replay(run_id, patch: %{0 => "new_code"})replaces the code at iteration 0 before eval. The tape entry is still consumed to maintain iteration alignment.Fallback
RLM.replay(run_id, fallback: :live, config: [llm_module: RLM.LLM])usesRLM.Replay.FallbackLLM, which consumes tape entries first and switches to live LLM calls when exhausted. This handles the case where a patch causes extra iterations beyond the recorded tape length.New modules
RLM.Replayreplay/2with patch/fallback/config supportRLM.Replay.Tapefrom_events/1builder from EventLog/TraceStoreRLM.Replay.LLMRLM.Replay.FallbackLLMModified modules
RLM.Configenable_replay_recordingfield (default:false)RLM.Workerreplay_patchesstruct field, tape loading in init, patch application before eval, LLM response recording telemetryRLM.Telemetry[:rlm, :llm, :response, :recorded]eventRLM.Telemetry.EventLogHandler:llm_responseevents +original_context/original_queryin:node_startRLMreplay/2public API, updated boundary exportsDesign decisions
RLM.LLMbehaviour'schat/4doesn't have a replay-state argument. Rather than changing the behaviour (breaking all implementations), the tape lives in the Worker's process dict, matchingRLM.Eval's existing pattern.enable_replay_recordingdefaults tofalse.Test plan
llm_responseevents stored when flag enabled, skipped when disabledoriginal_context/original_querystored innode_startevents:livefalls back to real LLM when tape exhausted:error(default) returns error when exhaustedRLM.replay/2delegates correctlymix compile --warnings-as-errorscleanmix format --check-formattedcleanmix docs— no new warnings🤖 Generated with Claude Code